Use fsck to detect corrupt file systems in reproducers #5518

FlorentRevest · 2024-11-18T19:18:04Z

This is a draft at the moment because I have a few unresolved questions such as:

whether this is the right direction architecturally speaking
how to express the dependency to fsck commands in syz-manager and the fsck_test (right now, if an fsck command is unavailable it will log a warning and move on)
more extensive test coverage (I only tried an ext2 and ext4 image but it could be nice to add more fs there)

dvyukov · 2024-11-19T09:24:32Z

Architecturally this looks good to me. cc @a-nogikh

We need to figure out what to do with error handling. Nobody is reading manager logs. It can log on error level, then these surface in out logs alerting, but if it happens repeatedly, it may be a problem. Is there any rate-limiting in our logs alerting?

Also now if I see "mount_in_repro", I don't really know if it means non-corrupted, or fsck failed. Do we want "mount_in_repro_non_corrupted"?

Maybe we could also append any errors to the log file. If there is a failure with fsck, we will always want to know more (why /how it failed).

pkg/asset/fsck.go

pkg/asset/fsck_test.go

pkg/asset/fsck.go

a-nogikh · 2024-11-19T10:26:28Z

Regarding how we could store this information in the DB and how we could check for corrupted images in the reporting rules (the question that popped up in offline discussions).

I'd vote for not splitting the dashapi.MountInRepro type into corrupted and non-corrupted images. Maybe we could add more flags/attributes into the Asset entity? E.g. whether it's corrupted, what's the fs type.

syzkaller/dashboard/app/entities_datastore.go

Lines 58 to 62 in 571351c

    
           type Asset struct { 
        
           	Type        dashapi.AssetType 
        
           	DownloadURL string 
        
           	CreateDate  time.Time 
        
           }

If we want to store fsck output logs, that could also go there as a text reference like in other places.

syzkaller/dashboard/app/entities_datastore.go

Line 355 in 571351c

Log int64 // reference to CrashLog text entity

I think that should be quite straightforward implementation-wise.

Checking for corrupted images in the Reporting filter is unfortunately much more tricky.

syzkaller/dashboard/app/config.go

Lines 274 to 275 in 571351c

    
           // Filter can be used to conditionally skip this reporting or hold off reporting. 
        
           Filter ReportingFilter `json:"-"`

syzkaller/dashboard/app/config.go

Line 369 in 571351c

ReportingFilter func(bug *Bug) FilterResult

First one important question. There can be multiple crashes per bug, some of which may include corrupted images, some may not. Sometimes there are also some unrelated syz_mount_image calls left, which could also be either corrupted or not.

What kinds of report filtering do we want to support w.r.t. fsck results?

Report if at least one of the reproducers mounts any non-corrupted fs image?
Only look at the images of the specific filesystem type? Do we actually know what fs type to look at?
Report only if there's at least one reproducer that mounts only non-corrupted images?

The answer to this question may affect the design considerations.

If we go with 3, we might also want to recalculate the crash priority depending on whether the associated reproducer mounts any corrupted fs images. See this method:

syzkaller/dashboard/app/api.go

Line 910 in 571351c

    
           func (crash *Crash) UpdateReportingPriority(c context.Context, build *Build, bug *Bug) {

It looks like we might have to transform the Reporting.Filter() callback to accept (*Bug, *Crash).

In the DB, we always reference the exact crash used in the specific reporting stage:

syzkaller/dashboard/app/entities_datastore.go

Line 323 in 571351c

CrashID int64 // crash that we've last reported in this reporting

So whenever we have to look at the previous stages, e.g. in the case below, we could pre-fetch all referenced crashes:

syzkaller/dashboard/app/reporting.go

Line 456 in 571351c

switch reporting.Filter(bug) {

There's a helper that makes it simpler:

syzkaller/dashboard/app/entities_datastore.go

Lines 1122 to 1123 in 571351c

    
           // dependencyLoader encapsulates the repetitive logic of mass loading referenced entities. 
        
           type dependencyLoader[T any] struct {

In the case when we decide whether to report or not and don't yet have CrashID set, I think we could first findCrashForBug and pass the candidate to needReport. That will likely make it much slower (unless we invent some optimizations), but it should work.

syzkaller/dashboard/app/reporting.go

Lines 78 to 84 in 571351c

    
           reporting, bugReporting, _, _, _, err := needReport(c, typ, state, bug) 
        
           if err != nil || reporting == nil { 
        
           	return nil, err 
        
           } 
        
           crash, crashKey, err := findCrashForBug(c, bug) 
        
           if err != nil { 
        
           	return nil, err

As part of google#5518, I'm adding fsck logs as annotation to the mounted file system assets. For this, I need a variety of fsck-like commands in the ci environment as well as eventually in the production environment.

FlorentRevest · 2024-11-28T12:55:45Z

Could you have another look ? I think we are now in a much better state in terms of logic to run fsck (I extended the syscall descriptions of file system mounts and expanded my tests range to 16 different file systems). But I'm still a bit unsure about the database storage story (I used a uint64 log entity but also piggy-backed on the assets list to render in the UI so it's a bit ugly that I had to introduce an asset type for something that is not actually uploaded like the other assets). Also I haven't explored much of the testing story yet since it's all over the place, let me know if you have any tests in mind you'd like to see.

The tests Should (TM) pass once #5549 is submitted and the env docker image published.

As part of google#5518, I'm adding fsck logs as annotation to the mounted file system assets. For this, I need a variety of fsck-like commands in the ci environment as well as eventually in the production environment.

As part of #5518, I'm adding fsck logs as annotation to the mounted file system assets. For this, I need a variety of fsck-like commands in the ci environment as well as eventually in the production environment.

a-nogikh

Looks good to me!
I've commented on the logic that is to be tested and what still needs to be adjusted.

prog/types.go

syz-manager/manager.go

prog/fsck.go

pkg/compiler/compiler.go

prog/fsck_test.go

dashboard/app/asset_storage.go

dashboard/app/entities_datastore.go

pkg/image/fsck_test.go

Syscall attributes are extended with a fsck command field which lets file system mount definitions specify a fsck-like command to run. This is required because all file systems have a custom fsck command invokation style. When uploading a compressed image asset to the dashboard, syz-manager also runs the fsck command and logs its output over the dashapi. The dashboard logs these fsck logs into the database. This has been requested by fs maintainer Ted Tso who would like to quickly understand whether a filesystem is corrupted or not before looking at a reproducer in more details. Ultimately, this could be used as an early triage sign to determine whether a bug is obviously critical.

dashboard/app/templates/templates.html

FlorentRevest marked this pull request as draft November 18, 2024 19:18

dvyukov reviewed Nov 19, 2024

View reviewed changes

a-nogikh reviewed Nov 19, 2024

View reviewed changes

pkg/asset/fsck.go Outdated Show resolved Hide resolved

a-nogikh mentioned this pull request Nov 19, 2024

syzkaller, syzbot: attach fsck results to bug reports #5494

Closed

FlorentRevest force-pushed the fsck branch 3 times, most recently from b78995a to 93e9e62 Compare November 28, 2024 11:48

FlorentRevest mentioned this pull request Nov 28, 2024

tools/docker: add various fsck-like commands to the environment #5549

Merged

FlorentRevest force-pushed the fsck branch from 93e9e62 to 6af3775 Compare November 28, 2024 12:50

FlorentRevest force-pushed the fsck branch 3 times, most recently from 9120da8 to feda84b Compare November 29, 2024 10:46

a-nogikh reviewed Dec 2, 2024

View reviewed changes

pkg/compiler: handle string syscall attributes

5a78e09

FlorentRevest force-pushed the fsck branch 2 times, most recently from d2a9001 to 6d08674 Compare December 4, 2024 20:51

a-nogikh previously approved these changes Dec 5, 2024

View reviewed changes

dashboard/app/entities_datastore.go Outdated Show resolved Hide resolved

pkg/image/fsck_test.go Outdated Show resolved Hide resolved

pkg/image/fsck_test.go Outdated Show resolved Hide resolved

FlorentRevest dismissed a-nogikh’s stale review via 24c479c December 6, 2024 14:33

FlorentRevest force-pushed the fsck branch from 6d08674 to 24c479c Compare December 6, 2024 14:33

FlorentRevest marked this pull request as ready for review December 6, 2024 14:33

a-nogikh reviewed Dec 6, 2024

View reviewed changes

dashboard/app/templates/templates.html Show resolved Hide resolved

a-nogikh approved these changes Dec 6, 2024

View reviewed changes

a-nogikh added this pull request to the merge queue Dec 9, 2024

Merged via the queue into google:master with commit deb7287 Dec 9, 2024
17 checks passed

Use fsck to detect corrupt file systems in reproducers #5518

Use fsck to detect corrupt file systems in reproducers #5518

Uh oh!

Conversation

FlorentRevest commented Nov 18, 2024

Uh oh!

dvyukov commented Nov 19, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

a-nogikh commented Nov 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FlorentRevest commented Nov 28, 2024

Uh oh!

a-nogikh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

a-nogikh commented Nov 19, 2024 •

edited

Loading